|
Book details / order |
APACHE SQOOP COOKBOOK UNLOCKING HADOOP FOR YOUR RELATIONAL DATABASE |
Integrating data from multiple sources is essential in the age of big data, but it can be a challenging and time-consuming task. this handy cookbook provides dozens of ready-to-use recipes for using apache sqoop, the command-line interface application that optimizes data transfers between relational databases and hadoop.
sqoop is both powerful and bewildering, but with this cookbook’s problem-solution-discussion format, you’ll quickly learn how to deploy and then apply sqoop in your environment. the authors provide mysql, oracle, and postgresql database examples on github that you can easily adapt for sql server, netezza, teradata, or other relational systems.
transfer data from a single database table into your hadoop ecosystem
keep table data and hadoop in sync by importing data incrementally
import data from more than one database table
customize transferred data by calling various database functions
export generated, processed, or backed-up data from hadoop to your database
run sqoop within oozie, hadoop’s specialized workflow scheduler
load data into hadoop’s data warehouse (hive) or database (hbase)
handle installation, connection, and syntax issues common to specific database vendors
about the authors
kathleen ting is currently a customer operations engineering manager at cloudera where she helps customers deploy and use the hadoop ecosystem in production. she has spoken on hadoop, zookeeper, and sqoop at many big data conferences including hadoop world, apachecon, and oscon. she's contributed to several projects in the open source community and is a committer and pmc member on sqoop.
jarek jarcec cecho is currently a software engineer at cloudera where he develops software to help customers better access and integrate with the hadoop ecosystem. he has led the sqoop community in the architecture of the next generation of sqoop, known as sqoop 2. he's contributed to several projects in the open source community and is a committer and pmc member on sqoop, flume, and mrunit.
table of contents:
chapter 1 getting started
downloading and installing sqoop
installing jdbc drivers
installing specialized connectors
starting sqoop
getting help with sqoop
chapter 2 importing data
transferring an entire table
specifying a target directory
importing only a subset of data
protecting your password
using a file format other than csv
compressing imported data
speeding up transfers
overriding type mapping
controlling parallelism
encoding null values
importing all your tables
chapter 3 incremental import
importing only new data
incrementally importing mutable data
preserving the last imported value
storing passwords in the metastore
overriding the arguments to a saved job
sharing the metastore between sqoop clients
chapter 4 free-form query import
importing data from two tables
using custom boundary queries
renaming sqoop job instances
importing queries with duplicated columns
chapter 5 export
transferring data from hadoop
inserting data in batches
exporting with all-or-nothing semantics
updating an existing data set
updating or inserting at the same time
using stored procedures
exporting into a subset of columns
encoding the null value differently
exporting corrupted data
chapter 6 hadoop ecosystem integration
scheduling sqoop jobs with oozie
specifying commands in oozie
using property parameters in oozie
installing jdbc drivers in oozie
importing data directly into hive
using partitioned hive tables
replacing special delimiters during hive import
using the correct null string in hive
importing data into hbase
importing all rows into hbase
improving performance when importing into hbase
chapter 7 specialized connectors
overriding imported boolean values in postgresql direct import
importing a table stored in custom schema in postgresql
exporting into postgresql using pg_bulkload
connecting to mysql
using direct mysql import into hive
using the upsert feature when exporting into mysql
importing from oracle
using synonyms in oracle
faster transfers with oracle
importing into avro with oraoop
choosing the proper connector for oracle
exporting into teradata
using the cloudera teradata connector
using long column names in teradata.
Author : Kathleen ting, jarek jarcec cecho
Publication : Oreilly
Isbn : 9789351103103
Store book number : 105
NRS 320.00
|
|
|
|
|
|
|
|
|
|